Where to Focus: Deep Attention-based Spatially Recurrent Bilinear Networks for Fine-Grained Visual Recognition

نویسندگان

Lin Wu

Yang Wang

چکیده

Fine-grained visual recognition typically depends on modeling subtle difference from object parts. However, these parts often exhibit dramatic visual variations such as occlusions, viewpoints, and spatial transformations, making it hard to detect. In this paper, we present a novel attention-based model to automatically, selectively and accurately focus on critical object regions with higher importance against appearance variations. Given an image, two different Convolutional Neural Networks (CNNs) are constructed, where the outputs of two CNNs are correlated through bilinear pooling to simultaneously focus on discriminative regions and extract relevant features. To capture spatial distributions among the local regions with visual attention, soft attention based spatial LongShort Term Memory units (LSTMs) are incorporated to realize spatially recurrent yet visually selective over local input patterns. All the above intuitions equip our network with the following novel model: two-stream CNN layers, bilinear pooling layer, spatial recurrent layer with location attention are jointly trained via an end-to-end fashion to serve as the part detector and feature extractor, whereby relevant features are localized and extracted attentively. We show the significance of our network against two well-known visual recognition tasks: fine-grained image classification and person re-identification.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

BoxCars: Improving Vehicle Fine-Grained Recognition using 3D Bounding Boxes in Traffic Surveillance

In this paper, we focus on fine-grained recognition of vehicles mainly in traffic surveillance applications. We propose an approach orthogonal to recent advancement in fine-grained recognition (automatic part discovery, bilinear pooling). Also, in contrast to other methods focused on fine-grained recognition of vehicles, we do not limit ourselves to frontal/rear viewpoint but allow the vehicles...

متن کامل

Integrating Scene Text and Visual Appearance for Fine-Grained Image Classification with Convolutional Neural Networks

Text in natural images contains rich semantics that are often highly relevant to objects or scene. In this paper, we focus on the problem of fully exploiting scene text for visual understanding. The main idea is combining word representations and deep visual features into a globally trainable deep convolutional neural network. First, the recognized words are obtained by a scene text reading sys...

متن کامل

Action Recognition using Visual Attention

We propose a soft attention based model for the task of action recognition in videos. We use multi-layered Recurrent Neural Networks (RNNs) with Long Short-Term Memory (LSTM) units which are deep both spatially and temporally. Our model learns to focus selectively on parts of the video frames and classifies videos after taking a few glimpses. The model essentially learns which parts in the fram...

متن کامل

Beyond Object Recognition: Visual Sentiment Analysis with Deep Coupled Adjective and Noun Neural Networks

Visual sentiment analysis aims to automatically recognize positive and negative emotions from images. There are three main challenges, including large intra-class variance, fine-grained image categories, and scalability. Most existing methods predominantly focus on one or two challenges, which has limited their performance. In this paper, we propose a novel visual sentiment analysis approach wi...

متن کامل

Brain Age Prediction Based on Resting-State Functional Connectivity Patterns Using Convolutional Neural Networks

Brain age prediction based on neuroimaging data could help characterize both the typical brain development and neuropsychiatric disorders. Pattern recognition models built upon functional connectivity (FC) measures derived from resting state fMRI (rsfMRI) data have been successfully used to predict the brain age. However, most existing studies focus on coarse-grained FC measures between brain r...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1709.05769 شماره

صفحات -

تاریخ انتشار 2017

Where to Focus: Deep Attention-based Spatially Recurrent Bilinear Networks for Fine-Grained Visual Recognition

نویسندگان

چکیده

منابع مشابه

BoxCars: Improving Vehicle Fine-Grained Recognition using 3D Bounding Boxes in Traffic Surveillance

Integrating Scene Text and Visual Appearance for Fine-Grained Image Classification with Convolutional Neural Networks

Action Recognition using Visual Attention

Beyond Object Recognition: Visual Sentiment Analysis with Deep Coupled Adjective and Noun Neural Networks

Brain Age Prediction Based on Resting-State Functional Connectivity Patterns Using Convolutional Neural Networks

عنوان ژورنال:

اشتراک گذاری